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ABSTRACT 

The  present  paper  deals  with  the  investigation  of  HIV- 
RT  inhibitory  activity  of  45  compounds.  Different 
topological  parameters,  including  distance  based 
topological  and  connectivity  indices  have  been  chosen 
for  modeling  pIC50  activity  of  these  compounds.  The 
MLR  shows  that  the  best  model  is  obtained  using  a  five 
parametric  model  containing  Jhete,  Jhetp,  logP,  xeq- 
The  QSAR  model  derived  from  the  above  mentioned 
descriptors  were  found  to  be  statistically  significant  and 
exhibited  superior  predictive  power.  The  correlation 
between  calculated  and  experimental  activity  was  0.79 
and  the  reliability  of  the  model  was  validated  with 
leave-one-out  cross-validation  method.  Its  predictive 
capability  was  further  validated  using  a  test  set  of  1 1 
inhibitors  similar  to  the  training  set  of  inhibitors. 

Keywords:  HIV-RT,  connectivity  indices,  log  P,  QSAR, 
MLR 

Introduction 

When  Human  Immunodeficiency  virus  attacks  the 
living  system,  the  result  is  very  serious  as  it  breaks 
down  the  body’s  immune  system  and  results  into  a 
deadly  disease  known  as  AIDS  [1,2]  This  virus  is  a 
type  of  retrovirus  which  has  a  reverse  transcriptase 
(RT).  Scientists  who  have  worked  for  many  years  took 
interest  in  this  field  and  ultimate  solution  to  this 
problem  is  obtained  in  the  form  of  HIV-RT,  an  enzyme 
which  converts  viral  RNA  into  a  double  stranded  viral 
DNA  which  allows  HIV  to  integrate  into  human 

Computational  Details 

The  structural  details  of  the  45  compounds  are  given  in 


Table  l.It  is  divided  into  training  (34  compounds)  and 
test  set  (11).  The  topological  indices  used  for  modeling 
HIV-RT  inhibitors  were  calculated  using  DRAGON 
[13]  software  and  are  presented  in  Table  2.  This  table 
also  includes  observed  activity  value  of  the  compounds 
used  in  the  present  study  (pIC5o).The  correlation  among 
the  topological  indices  is  calculated  by  using 
correlation  matrix  which  is  reported  in  Table  3. While 
performing  variable  selection  for  multiple  regression 
using  all  the  DRAGON  descriptors  together  with 
equalized  electronegativity  we  observed  that  out  of  all 
the  descriptors  used  equalized  electronegativity  (xeq)  is 
the  descriptor  to  start  one-variable  modeling  of  HIV- 
RT  inhibition.  Also,  that  all  the  models  obtained  during 
variable  selection  invariably  contain  Xeq  as  one  of  the 
correlating  descriptor  indicating  that  Xeq  plays  a 
dominating  role  in  modeling  HIV-RT  inhibition.  The 
results  obtained  herein  indicated  that  more  reliable 
models  can  be  obtained  when  Xeq  is  combined  with 
other  topological  indices. 

At  this  stage  it  is  worthy  to  mention  that  when  two  or 
more  atoms  initially  of  different  electronegativities 
combine  chemically,  they  adjust  to  same  intermediate 
electronegativity  with  in  the  compound.  That  is,  the 
electron  will  flow  from  the  more  electronegative  atom 
creating  a  partial  positive  charge  on  the  former  and 
partial  negative  charge  on  the  later.  As  the  positive 
charge  on  the  electropositive  atom  increases,  its 
effective  nuclear  charge  increases.  Hence,  its 
electronegativity  increases.  The  same  trends  happening 
in  the  opposite  direction  for  the  more  electronegative 
atom  until  two  have  the  same  electronegativity.  This 
principle  has  gained  wide  acceptance  [12-16].  In  the 
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frame  of  the  Sanderson’s  principle  [17],  it  is  generally 
believed  that  partial  charge  acquired  by  an  atom 
through  chemical  combination  is  proportional  to  the 
difference  between  the  final  equalized  electronegativity 
and  the  initial  pre-bonded  electronegativity  charge 
conservation  equation  leads  to  a  general  expression  for 
equalized  electronegativity  (yeq) 

Xeq=N/SV/X  '  [1] 

Where  N  is  the  total  number  of  atoms  in  the  species 
formula,  V  is  the  number  of  atoms  of  a  particular 
element  in  the  species  formula  and  yis  the 
electronegativity  of  particular  atom.  Earlier  Agrawal 
and  Khadikar[18]  have  successfully  used  equalized 
electronegativity  for  modeling  toxicity  of  nitrobenzene 
derivatives  in  which  they  have  obtained  reasonably 
good  results  by  combining  yeq  with  other  topological 
indices.  There  also  they  observed  that  out  of  several 
descriptors  used  it  was  yeqalone  which  gave  statistically 
significant  one -variable  model.  In  the  present  study 
therefore,  we  have  used  yeq  along  with  few  other 
topological  indices  for  modeling  HIV-RT  inhibitors. 

Balaban  Heteroatom  Index  (Jhet) 

This  is  an  extension  of  Balaban  index  (J)  [19-21]  to 
molecules  containing  heteroatoms.  In  the  case  of  hetero 
atoms  differentiation  is  made  between  the  atoms  of 
different  kinds  by  modifying  the  corresponding 
elements  of  the  distance  matrix  D.  For  instance,  the 
following  modification  was  suggested  for  the  diagonal 
elements: 


(D)ii=l-(ZC/Zi) 

[2] 


and  3  for  single,  an  aromatic,  double  and  triple  bond 
respectively. 

Randic  Molecular  Connectivity  Index 

The  branching  or  connectivity  index  originally  defined 
by  Randic  [22]  is  referred  to  as  the  path-1  molecular 
connectivity  'y.  The  value  of  'y  reflects  both  the  size 
and  the  branching  of  the  structure.  It  is  related  to  the 
size  of  the  molecule  because  when  extra  atoms  and 
bonds  are  added,  more  terms  are  added  to  the 
summation,  and  the  value  grows,  'yis  also  related  to  the 
degree  of  branching  of  the  molecule  because,  when 
more  branching  occurs,  the  denominators  for  those 
terms  become  larger  and  the  terms  themselves  become 
smaller,  thus  decreasing  the  overall  value  for  the 
index. Randic  molecular  connectivity  index  is  defined 
as 

edges  ij  [5] 

Dj  and  Dj  -  the  edge  degrees  (atom  connectivity)  of  the 
molecular  graph. 

Randicindices  of  different  orders 


path 

logP-  Partition  Coefficient  [23-24]  (Lipophilicity) 


[6] 


log  P oct/wat  =  log 


[solute'] 


octanol 


[7] 


solute] 


uii  ionizoii 
water 


Where  Zc  =  6  and  Z;  is  determined  by  the  number  of  all 
electrons  of  atom  i  or  namely  Zj  is  the  atomic  number 
of  given  elements. 

The  off-diagonal  elements  of  the  modified  distance 
matrix  for  heteroatom  systems  are  given  by  the 
following  equation: 

(D)ij  =  E  krr  [3] 

Where,  the  summation  is  over  r  bonds. 

The  bond  parameter  kr  is  given  by  the  following 
expression: 

kr=  1/  wr  X  (Zc) 2/  (Zj  +  Zj)  [4] 

Where  wr  is  the  bond  weight  with  values  of  1,  1.5,2, 


Different  mod  els  were  obtained  using  one  to  five 
correlating  parameters.  The  quality  and  the  regression 
parameters  of  these  models  were  also  calculated.  The 
comparison  of  the  calculated  activity  using  the  most 
appropriate  model  (model  18)  is  made  with  the 
experimental  activity.  Fig.  1  records  correlation 
between  experimental  and  estimated  activities  of 
training  sets. 

Results  and  Discussion: 

The  compounds  used  in  the  present  study  have  been 
listed  in  Table  1.  Those  with  *  denotes  the  compounds 
used  for  test  set  and  remaining  are  in  the  training  set. 
The  topological  parameters  along  with  biological 
activity  in  the  fonn  of  pIC50have  been  summarized  in 
Table  2.  Table  3  demonstrates  the  correlation  matrix 
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showing  inter  correlation  among  all  the  parameters.  A 
close  observation  of  this  table  clearly  indicates  that: 

1.  No  mono-parametric  correlation  is  capable 
ofmodeling  the  pIC50  value  of  present  set  of 
compounds;however  logP,  Jhetp  and  *eq  are  the  most 
suitable  parameters  to  be  used  in  multiparametric 
modeling. 

2.  4*  and  Jhete  are  moderately  correlated  whereas  Jhete 
and  Jhetp  also  show  good  correlation.However,  this 
correlation  has  a  lower  magnitude  than  the  previous 
one. 

The  entire  data  setis  divided  into  training  and  test  sets. 
The  training  set  was  subjected  to  regression  analysis 
using  NCSS  software.  All  thestatistically  significant 
models  along  with  their  quality  have  been  summarized 
in  Table  4.  This  Table  also  includes  the  values  of 
Pogliani’s  quality  factor  Q  [25-27],  which  is  the  ratio  of 
R  and  Se. 

The  best  three,  four  and  five-parametric  models,  which 
are  most  significant,  are  given  below: 

(i)  Three-variable  model(Model  14,  Table-4) 

pIC50=-2.4300±(0.7833)  JhetP+0.4901±(0.1521)  logP- 
6. 1642±(  1.2395)  Xeq+23.6746 

N=34,  R2=0.6156,  R2a=0.5771,  Se=0.0677,  F=16.012, 
Q=1 1.5847 

(ii)  Four-variable  model(Model  17,  Table-4) 

pIC50=-l  1.4322±(2.3692)  Jhete-4.5078±(0.8445) 

Jhetp+2.3568±(0.4961)V5.3358±(1. 0854)  Xeq+33. 1638 

N=34,  R2=0.7332,  R2a=0.6964,  Se=0.0574,  F=1 9.924, 
Q=14.9176 

(iii)  Five-variable  model(Model  18,  Table-4) 

pIC50=-9.6436±(2.2178)  Jhete-4.0804±(0.7726) 

J  hetP+0 . 341 6±(0. 1207)  logP+2.1044±(0.4540)  \- 

5.2982±(0.9741)  xeq  +  28.6878 

N=34,  R2=0.7926,  R2a=0.7555,  Se=0.0515,  F=2 1.399, 
Q=17.2870 

1 .  When  4z  is  added  to  three-variable  model,  then  the 
value  of  R  shows  significant  improvement.  Also 
the  adjusted  R"  value  changes  from  0.5771  to 
0.6964,  suggesting  that  the  addition  of  4fis 
favourable. 


2.  Further  improvement  is  observed  when  Jhete,  Jhetp, 
logP,  4*  and  *eq  have  been  taken  together  resulting 
into  a  five-parametric  model  (model  18,  Table  4). 
Here  R2  is  0.7926  and  Pogliani’s  quality  factor  has 
a  value  17.2870.  This  value  is  the  highest  among  all 
the  models. 

Therefore  the  five  parametric  model  is  the  best  model 
for  modeling  the  pIC5o  activity  of  compounds  used  in 
the  present  study. 

From  the  calculations,  it  is  observed  that  statistically 
significant  models  start  pouring  from  three-variable 
model  (model- 14,  Table  4).  Also,  that  the  proposed 
models  invariably  contain  equalized  electronegativity 
(Xeq)  as  one  of  the  correlating  parameters.  This 
demonstrates  the  dominating  role  of  %eq  in  modeling 
HIV-RT  inhibitors.  Further  it  is  also  observed  that  the 
proposed  models  contain  logP  as  correlating  parameter. 
This  shows  that  hydrophobicity  is  another  important 
parameter  for  the  exhibition  of  the  activity.  It  is 
interesting  to  mention  that  in  all  the  proposed  models 
the  coefficient  of  xeq  is  negative,  while  that  of  logP  is 
positive.  This  means  that  increase  in  the  hydrophobicity 
and  decrease  in  the  equalized  electronegativity  is 
favourable  for  the  exhibition  of  the  activity.  The 
proposed  modelsshow  that  Balaban  type  indices  also 
support  the  exhibition  of  the  activity.  Their 
negativecoefficients  all  over  indicate  that  decrease  in 
their  magnitude  favours  exhibition  of  the  activity. 

It  is  interesting  to  mention  that  in  all  the  above  cases, 
both  R2  and  R2A  goes  on  increasing  with  each  addition 
of  the  correlating  parameters.  Naturally  R2  will  always 
increase  with  each  addition  of  correlating  parameters. 
However,  R2A  will  increase  if  the  added  parameter  is 
favourable  for  the  exhibition  of  activity  otherwise  it 
will  decrease  in  our  case.  Since  R2A  goes  on  increasing 
indicating  that  the  added  parameter  is  favorable  for  the 
exhibition  of  the  activity. 

The  best  five-parametric  model  was  used  for  estimating 
the  PIC50  value  of  training  set.  Such  values  are  reported 
in  Table  5.  The  predictive  potential  of  this  model  has 
been  obtained  by  plotting  a  graph  between  observed 
pIC5o  and  estimated  pIC5o  values.  Such  correlation  is 
depicted  in  Fig.  1. 

To  validate  the  model  cross  validation  parameters  have 
been  calculated  and  they  are  reported  in  Table  6.  It  is  an 
established  fact  that  PRESS  is  a  good  estimate  of  the 
real  predictive  power  of  the  model.  If  PRESS  is  smaller 
than  SSY,  the  model  predicts  better  than  chance  and 
can  be  considered  statistically  significant.  Table  6 
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shows  that  in  this  regard,  all  the  models  proposed  by  us 
are  better  than  chance  and  are  statistically  significant. 
The  ratio  PRESS  /  SSY  can  be  used  to  calculate  the 
approximate  confidence  interval  of  the  prediction  of 
new  compounds.  To  be  a  reasonably  good  QSAR 
model,  this  ratio  should  be  smaller  than  0.4.  The 
models  proposed  by  us  are  having  this  ratio  smaller 
than  0.4  and  therefore,  the  model- 18  has  excellent 
predictive  power.  The  developed  models  are  cross- 
validated  by  leave-one-out  method.  Another  cross- 
validated  parameter  related  to  uncertainty  of  prediction, 
the  PSE,  has  also  been  calculated.  The  lowest  value  of 
PSE  for  model  18  supports  its  highest  predictive 
potential  (power). The  low  value  of  PSE  and  Spress  and 
high  value  of  R2cv  suggest  that  the  five-parametric 
model  is  most  appropriate  in  predicting  pIC50  value  of 
present  set  of  compounds. 

This  model  was  further  validated  by  estimation  of  pICso 
value  for  the  test  set.  The  results  are  reportedin  Table  7. 
A  close  look  at  this  table  reveals  that  estimated 
activities  are  in  good  agreement  with  the  observed 
activities.  Hence,  model  18  can  be  used  for  modeling 
the  pIC5o  activity  of  present  set  of  compounds.  The 
correlation  potential  for  the  test  set  is  0.835,  which  is 
better  than  the  correlation  potential  of  the  training  set. 

Conclusions: 

Following  conclusions  may  be  drawn. 

1.  The  topological  indices  in  combination  with 
equalized  electronegativity  are  best  suitable  for 
modeling  the  anti  HIV  activity  of  present  set  of 
compounds. 

2.  Topological  indices  alone  are  not  able  to  model  the 
biological  activity  in  present  set  of  compounds. 

3.  Negative  coefficients  of  Balaban  type  indices 
suggest  that  they  have  retarding  effect  towards  the 
pIC5o  values,  hence  in  future  designing  of  potent 
compounds  their  lower  values  will  give  results. 

4.  Higher  value  of  logP  will  favour  the  activity  as  its 
coefficient  is  positive  in  the  model. 

5.  Due  to  negative  coefficient  of  xeq,  lowelectronegativity 
value  will  enhance  the  biological  activity. 


Table  1:  Structural  details  of  compounds  used  in  the 
present  study. 


Compd. 

R1 

X 

R2 

1 

2,4-diCl 

ch2 

nh2 

2 

2,4,6-triCl 

ch2 

nh2 

3 

2,4,6-triMe 

ch2 

nh2 

*4 

2,6diCl 

ch2 

Cl 

5 

2,6diCl 

ch2 

NHMe 

6 

2,6diCl 

ch2 

NMe2 

7 

2,6diCl 

ch2 

NHOMe 

*8 

2,6diCl 

ch2 

NHOEt 

9 

2,6diCl 

ch2 

SMe 

10 

2,6diCl 

ch2 

SH 

11 

2,6diCl 

ch2 

OMe 

*12 

2,6diCl 

ch2 

OH 

13 

2,6diCl 

ch2 

F 

14 

2,4,6-triMe 

NH 

H 

15 

2,6-diMe-4-Me 

NH 

H 

*16 

2,6-diMe-4-Br 

NH 

H 

17 

2,6-diMe-4-CN 

NH 

H 

18 

2,4,6-triMe 

O 

H 

19 

2,4,6-triMe 

s 

H 

*20 

2,4,6-triMe 

s 

nh2 

21 

2,6-diMe 

0 

nh2 

22 

2,6-diMeO 

0 

nh2 

23 

2,4,6-triCl 

0 

nh2 

*24 

2,4,6-triBr 

0 

nh2 

25 

2,6-diCl-4-F 

0 

nh2 

26 

2,6-diBr-4-Me 

0 

nh2 

27 

2,6-diMe-4-Br 

0 

nh2 

*28 

2,6-diMe-4-Cl 

0 

nh2 
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29 

2,4,6-triMe 

NH 

H 

30 

2,6-diMe-4-CN 

0 

H 

31 

2,6-diMe-4-Br 

0 

H 

*32 

2,6-diMe-4-Br 

S 

H 

33 

2,4,6-triMe 

0 

H 

34 

2,4-diBr-6-F 

NH 

H 

35 

2,4,6-triCI 

NH 

H 

*36 

2,6-diCI-6-Me 

NH 

H 

37 

2,6-diBr-4-Me 

NH 

H 

38 

2,6-diMe-4-Br 

NH 

H 

*40 

Table  2.Calculated  values  of  the  topological  parameters  for  the  compounds  used  in  the  present  study  with 
equalizedelectronegativity  values  and  their  experimental  pIC50  values 


Compd. 

No. 

PIC50 

Jhetp 

Jhete 

4x 

logP 

Xeq 

1 

7.85 

1.443 

1.966 

7.136 

5.03 

2.437 

2 

8.85 

1.487 

2.008 

7.755 

5.55 

2.460 

3 

9.10 

1.478 

1.991 

7.755 

5.40 

2.356 

4* 

5.97 

1.476 

2.003 

7.230 

6.00 

2.460 

5 

8.15 

1.474 

2.044 

7.368 

5.39 

2.419 

6 

6.78 

1.491 

2.093 

7.493 

6.08 

2.404 
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7 

8.21 

1.412 

2.081 

7.679 

5.02 

2.438 

8* 

7.61 

1.373 

2.088 

7.763 

5.36 

2.421 

9 

7.74 

1.523 

2.038 

7.368 

6.47 

2.419 

10 

7.03 

1.479 

1.999 

7.230 

5.90 

2.437 

11 

7.92 

1.448 

2.055 

7.368 

5.56 

2.419 

12* 

6.80 

1.453 

2.004 

7.230 

5.53 

2.456 

13 

5.28 

1.439 

2.006 

7.230 

5.71 

2.805 

14 

9.52 

1.296 

1.989 

7.303 

6.39 

2.366 

15 

9.30 

1.307 

1.998 

7.303 

6.94 

2.430 

16* 

9.22 

1.301 

1.993 

7.303 

6.66 

2.395 

17 

9.00 

1.311 

1.993 

7.418 

5.70 

2.401 

18 

9.22 

1.178 

2.046 

7.303 

6.28 

2.380 

19 

8.70 

1.557 

1.955 

7.303 

6.84 

2.365 

20* 

8.54 

1.594 

2.026 

7.755 

6.63 

2.369 

21 

8.51 

1.203 

2.114 

7.230 

5.56 

2.397 

22 

7.51 

1.166 

2.180 

7.869 

4.27 

2.433 

23 

8.62 

1.220 

2.137 

7.755 

6.45 

2.505 

24* 

8.27 

1.227 

2.131 

7.755 

6.89 

2.492 

25 

8.14 

1.198 

2.140 

7.755 

5.94 

2.520 

26 

8.70 

1.223 

2.127 

7.755 

6.62 

2.449 

27 

8.82 

1.218 

2.122 

7.755 

6.34 

2.414 

28* 

8.46 

1.216 

2.124 

7.755 

6.20 

2.417 

29 

9.00 

1.319 

1.981 

7.303 

6.21 

2.351 

30 

8.96 

1.209 

2.043 

7.418 

5.41 

2.398 
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31 

8.54 

1.201 

2.042 

7.303 

6.37 

2.393 

32* 

8.24 

1.598 

1.952 

7.303 

6.94 

2.376 

33 

8.54 

1.197 

2.038 

7.303 

6.10 

2.364 

34 

9.22 

1.303 

1.999 

7.303 

6.38 

2.466 

35 

9.15 

1.326 

1.999 

7.303 

6.60 

2.460 

36* 

9.00 

1.323 

1.993 

7.303 

6.47 

2.417 

37 

8.68 

1.330 

1.990 

7.303 

6.76 

2.409 

38 

8.64 

1.324 

1.985 

7.303 

6.48 

2.378 

39 

6.89 

1.424 

1.968 

6.801 

4.51 

2.415 

40* 

7.29 

1.155 

2.147 

7.755 

4.92 

2.551 

41 

8.40 

1.191 

2.051 

7.418 

5.59 

2.417 

42 

8.44 

1.590 

1.948 

7.303 

6.66 

2.349 

43 

8.68 

1.220 

2.129 

7.755 

6.47 

2.518 

44* 

8.89 

1.187 

2.056 

7.303 

6.83 

2.517 

45 

9.00 

1.332 

2.060 

7.755 

6.18 

2.371 

*  Test  set 


Table  3:  Correlation  matrix 


Jhete 

4x 

logP 

Xeq 

t— ( 

O 

o 

1.0000 

Jhetp 

-0.3238 

1.0000 

Jhete 

-0.0783 

-0.6138 

1.0000 

x 

0.1741 

-0.2610 

0.7188 

1.0000 

logP 

0.4431 

0.0404 

-0.2120 

-0.0456 

1.0000 

Xeq 

-0.5348 

-0.1544 

0.2816 

0.0888 

-0.1304 

1.0000 
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Table  4.Regression  parameters  and  quality  of  correlation  of  training  set 


Model 

No. 

Parameter 

s 

Used 

Ai=(l . 5) 

B 

Se 

R2 

R2a 

F-ratio 

Q=R/Se 

1 

Jhete 

-1.08961(2.5160) 

10.6063 

0.1055 

0.0058 

0.0000 

0.188 

0.7219 

2 

4x 

0.84801(0.6237) 

2.0823 

0.0251 

0.0546 

0.0251 

1.848 

9.3094 

3 

Jhetp 

-2.44451(1.1434) 

11.6545 

0.0989 

0.1250 

0.0976 

4.571 

3.5749 

4 

logP 

0.57151(0.2138) 

4.9694 

0.0956 

0.1825 

0.1570 

7.140 

4.4686 

5 

Xeq 

-6.45811(1.5531) 

24.0632 

0.0852 

0.3508 

0.3305 

17.290 

6.9517 

7 

4 

X 

Xeq 

1.02131(0.4938) 

-6.69681(1.4837) 

17.0517 

0.0812 

0.4295 

0.3927 

11.670 

8.0710 

8 

Jhetp 

Xeq 

-2.51041(0.8935) 

-6.52121(1.4089) 

27.5771 

0.0773 

0.4826 

0.4492 

14.550 

8.9870 

9 

logP 

Xeq 

0.50511(0.1719) 

-6.09231(1.4011) 

20.1569 

0.0766 

0.4922 

0.4599 

15.025 

9.1589 

10 

Xeq 

Jhete 

logP 

-6.31641(1.4361) 

1.53331(1.8988) 

0.52631(0.1749) 

17.4473 

0.0770 

0.5030 

0.4533 

10.122 

9.2107 

11 

Jhete 

Jhetp 

4x 

-14.42221(3.0483) 

-5.14181(1.1110) 

2.65331(0.6555) 

24.9501 

0.0764 

0.5109 

0.4620 

10.445 

9.3557 

12 

Jhetp 

4x 

Xeq 

-2.14601(0.9084) 

0.72081(0.4781) 

-6.68051(1.3849) 

22.1189 

0.0758 

0.5190 

0.4709 

10.790 

9.5042 

13 

Jhete 

Jhetp 

-3.74041(2.2677) 

-3.55381(1.0754) 

35.2443 

0.0752 

0.5256 

0.4781 

11.078 

9.6407 
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Xeq 

-5.96291(1.4125) 

14 

logP 

0.50331(0.1608) 

13.2045 

0.0716 

0.5699 

0.5269 

13.253 

10.5434 

4x 

1.01481(0.4358) 

Xeq 

-6.33071(1.3147) 

14 

Jhetp 

-2.43001(0.7833) 

23.6746 

0.0677 

0.6156 

0.5771 

16.012 

11.5847 

logP 

0.49011(0.1521) 

Xeq 

-6.16421(1.2395) 

15 

Jhete 

-2.46511(2.0739) 

29.0356 

0.0673 

0.6334 

0.5829 

12.527 

11.8256 

■Jhetp 

-3.12401(0.9727) 

logP 

0.45141(0.1545) 

Xeq 

-5.82451(1.2638) 

16 

Jhetp 

-2.06271(0.7857) 

18.1689 

0.0655 

0.6525 

0.6046 

13.615 

12.3324 

logP 

0.49111(0.1471) 

4x 

0.72601(0.4133) 

Xeq 

-6.32401(1.2020) 

17 

Jhete 

-11.43221(2.3692) 

33.1638 

0.0574 

0.7332 

0.6964 

19.924 

14.9176 

Jhetp 

-4.50781(0.8445) 

4x 

2.35681(0.4961) 

Xeq 

-5.33581(1.0854) 

18 

Jhete 

-9.64361(2.2178) 

28.6878 

0.0515 

0.7926 

0.7555 

21.399 

17.2870 

Jhetp 

-4.08041(0.7726) 

logP 

0.34161(0.1207) 

4x 

2.10441(0.4540) 

Xeq 

-5.29821(0.9741) 
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Table  5:  Observed  and  estimated  values  of  pIC50  (training  set)  using  model  no  18 


Compd.  no 

Actual 

PIC50 

Predicted 

PIC50 

Residual 

1 

7.85 

7.66 

0.19 

2 

8.85 

8.44 

0.41 

3 

9.10 

9.14 

-0.04 

5 

8.15 

7.49 

0.66 

6 

6.78 

7.53 

-0.75 

7 

8.21 

7.82 

0.39 

9 

7.74 

7.72 

0.02 

10 

7.03 

7.69 

-0.66 

11 

7.92 

7.55 

0.37 

13 

5.28 

5.78 

-0.50 

14 

9.52 

9.23 

0.29 

15 

9.30 

8.95 

0.35 

17 

9.00 

8.96 

0.04 

18 

9.22 

9.06 

0.17 

19 

8.70 

8.66 

0.04 

21 

8.51 

7.81 

0.70 

22 

7.51 

8.04 

-0.53 

23 

8.62 

8.35 

0.27 

25 

8.14 

8.16 

-0.02 

26 

8.70 

8.79 

-0.09 

27 

8.82 

8.95 

-0.13 

29 

9.00 

9.24 

-0.24 

30 

8.96 

8.81 

0.16 

31 

8.54 

8.96 

-0.42 

33 

8.54 

9.08 

-0.54 

34 

9.22 

8.58 

0.64 

35 

9.15 

8.59 

0.56 

37 

8.68 

8.98 

-0.30 

38 

8.64 

9.13 

-0.49 

39 

6.89 

6.96 

-0.07 

41 

8.40 

8.77 

-0.37 

42 

8.44 

8.61 

-0.17 

43 

8.68 

8.37 

0.31 
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Table  6:  Cross  validation  parameters  for  the  proposed  models 


Model 

No. 

Parameters  used 

PRESS 

SSY 

PRESS 

/SSY 

R2cv 

SpRESS 

PSE 

14 

Jhetplo§PXeq 

9.6744 

15.4904 

0.6245 

0.3755 

0.5679 

0.2845 

17 

1  1  4 

Jhete  Jhetp,  Xeq  X 

6.714 

18.4508 

0.3639 

0.6361 

0.4812 

0.1975 

18 

Jhetp,  JheteXeq  lo§P  X 

5.2195 

19.9452 

0.2617 

0.7383 

0.4318 

0.1535 

Figure  1:  Correlation  between  observed  and  estimated  pICso  using  training  set  (model  18) 

Table  7:  Observed  and  estimated  values  of  pICso  of  test  set  molecules  using  model  18 


Compd.  No. 

Obs.pICso 

Est.pICs# 

Residual 

4 

5.97 

7.58 

-1.61 

8 

7.61 

8.29 

-0.68 

12 

6.80 

7.52 

-0.72 

16 

9.22 

9.11 

0.11 

20 

8.54 

8.68 

-0.14 

24 

8.27 

8.60 

-0.33 

28 

8.46 

8.87 

-0.41 

32 

8.24 

8.49 

-0.25 

36 

9.00 

8.84 

0.16 

40 

7.29 

7.76 

-0.47 

44 

8.89 

8.39 

0.50 
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Figure  3.Correlation  between  observed  and  estimated  pIC50  of  test  set  by  external  cross  validation  using  model  18 
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